NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Scaling Inference-Efficient Language Models

Bian, Song; Yan, Minghao; Venkataraman, Shivaram (July 2025, Proceedings of the 42 nd International Conference on Machine Learning)

Free, publicly-accessible full text available July 13, 2026
Does compressing activations help model parallel training?

Bian, Song; Li, Dacheng; Wang, Hongyi; Xing, Eric P; Venkataraman, Shivaram (May 2024, Seventh Annual Conference on Machine Learning and Systems)

Foundation models have superior performance across a wide array of machine learning tasks. The training of these models typically involves model parallelism (MP) to navigate the constraints of GPU memory capacity. However, MP strategies involve transmitting model activations between GPUs, which can hinder training speed in large clusters. Previous research has examined gradient compression in data-parallel contexts, but its applicability in MP settings remains largely unexplored. In this paper, we investigate the unique characteristics of compression in MP and study why strategies from gradient compression might not be directly applicable to MP scenarios. Subsequently, to systematically understand the capabilities and limitations of Model Parallelism Compression, we present a benchmarking framework MCBench. MCBench not only includes four major categories of compression algorithms but also includes several widely used models spanning language and vision tasks on a well-established distributed training framework, Megatron-LM. We initiate the first comprehensive empirical study by using MCBench. Our empirical study encompasses both the fine-tuning and pre-training of FMs. We probe over 200 unique training configurations and present results using 10 widely used datasets. To comprehend the scalability of compression advantages with the expansion of model size and cluster size, we propose a novel cost model designed specifically for training with MP compression. The insights derived from our findings can help direct the future development of new MP compression algorithms for distributed training. Our code is available at https://github.com/uw-mad-dash/MCBench
more » « less
Full Text Available
AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference

Qian, Lou; Bian, Song; Lei, Jiang (October 2020, Advances in Neural Information Processing Systems)

Hybrid Privacy-Preserving Neural Network (HPPNN) implementing linear layers by Homomorphic Encryption (HE) and nonlinear layers by Garbled Circuit (GC) is one of the most promising secure solutions to emerging Machine Learning as a Service (MLaaS). Unfortunately, a HPPNN suffers from long inference latency, e.g., ∼100 seconds per image, which makes MLaaS unsatisfactory. Because HE-based linear layers of a HPPNN cost 93% inference latency, it is critical to select a set of HE parameters to minimize computational overhead of linear layers. Prior HPPNNs over-pessimistically select huge HE parameters to maintain large noise budgets, since they use the same set of HE parameters for an entire network and ignore the error tolerance capability of a network. In this paper, for fast and accurate secure neural network inference, we propose an automated layer-wise parameter selector, AutoPrivacy, that leverages deep reinforcement learning to automatically determine a set of HE parameters for each linear layer in a HPPNN. The learning-based HE parameter selection policy outperforms conventional rule-based HE parameter selection policy. Compared to prior HPPNNs, AutoPrivacy-optimized HPPNNs reduce inference latency by 53%∼70% with negligible loss of accuracy.
more » « less
Full Text Available
DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python

https://doi.org/10.1145/3448016.3457330

Peng, Jinglin; Wu, Weiyuan; Lockhart, Brandon; Bian, Song; Yan, Jing Nathan; Xu, Linghao; Chi, Zhixuan; Rzeszotarski, Jeffrey M.; Wang, Jiannan (June 2021, SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data)

Exploratory Data Analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their API design is either too low level, which is optimized for plotting rather than EDA, or too high level, which is hard to specify more fine-grained EDA tasks. In response, we propose DataPrep.EDA, a novel task-centric EDA system in Python. DataPrep.EDA allows data scientists to declaratively specify a wide range of EDA tasks in different granularity with a single function call. We identify a number of challenges to implement DataPrep.EDA, and propose effective solutions to improve the scalability, usability, customizability of the system. In particular, we discuss some lessons learned from using Dask to build the data processing pipelines for EDA tasks and describe our approaches to accelerate the pipelines. We conduct extensive experiments to compare DataPrep.EDA with Pandas-profiling, the state-of-the-art EDA system in Python. The experiments show that DataPrep.EDA significantly outperforms Pandas-profiling in terms of both speed and user experience. DataPrep.EDA is open-sourced as an EDA component of DataPrep: https://github.com/sfu-db/dataprep.
more » « less
Full Text Available
NASS: Optimizing Secure Inference via Neural Architecture Search

https://doi.org/10.3233/FAIA200288

Bian, Song; Jiang, Weiwen; Lu, Qing; Shi, Yiyu; Sato, Takashi (September 2020, European Conference on Artificial Intelligence)
null (Ed.)
Full Text Available

Search for: All records